DATA PROGRAMMING

PROJECT

Author

Yusuf Emre KILICER

Published

June 9, 2024

Hello, I am Yusuf Emre Kilicer. Welcome to my project on Obesity. This project involves a comprehensive analysis of a dataset related to obesity, focusing on various factors such as eating habits, physical activity levels, and family history. The goal is to uncover patterns and insights that can contribute to our understanding of obesity and its associated factors.

In this project, I undertook the following steps:

  • Data Cleaning and Preprocessing

  • Exploratory Data Analysis

  • Data Manipulation

  • Visualization

  • Animated Visualizations

  • Tables

Throughout the project, you will find detailed explanations of each step, along with the corresponding R code and outputs. These elements together aim to provide a thorough and engaging analysis of obesity-related data.

I hope you find the project informative and engaging.

KIND REGARDS

EMRE

Pozdravljeni, sem Yusuf Emre Kilicer. Dobrodošli v mojem projektu o debelosti. Ta projekt vključuje celovito analizo podatkov, povezanih z debelostjo, s poudarkom na različnih dejavnikih, kot so prehranjevalne navade, stopnje telesne aktivnosti in družinska anamneza. Cilj je odkriti vzorce in vpoglede, ki lahko prispevajo k našemu razumevanju debelosti in z njo povezanih dejavnikov.

V tem projektu sem izvedel naslednje korake:

  • Čiščenje in predobdelava podatkov

  • Raziskovalna analiza podatkov

  • Manipulacija s podatki

  • Vizualizacija

  • Animirane vizualizacije

  • Tabele

V celotnem projektu boste našli podrobne razlage vsakega koraka, skupaj z ustrezno kodo v jeziku R in izhodi. Ti elementi skupaj si prizadevajo zagotoviti temeljito in zanimivo analizo podatkov, povezanih z debelostjo.

Upam, da boste projekt našli poučen in zanimiv.

PRIJAZNO SPOŠTOVANJE

EMRE

🤫 ADMISSION (click to expand) 🙈 !

I used google translate for greeting Slovenian.

Thus , I don’t know if it was translated correctly…

WELCOME TO MY PROJECT

TIME TO SHOW MY MASTERY OF R LANGUAGE

IMPORTING LIBRARIES

Below are the libraries necessary for this project:

library(tibble)
library(knitr)
library(dplyr)
library(janitor)
library(tidyverse)
library(ggplot2)
library(DT)
library(gganimate)
library(patchwork)
library(gifski)

IMPORTING DATASET

Let’s start with loading the data first.

my_data <-read.csv("/Users/y.emrekilicer/Desktop/DATA KONU PROJE ODEV/DATA PROJE/ObesityDataSet_raw_and_data_sinthetic.csv")

NOW LET’S SEE WHAT WE GET

NOTE THAT EVERY NECESSARY INFORMATION ABOUT COLUMNS, UNITS ETC WILL BE PRINTED LATER.THIS IS JUST FOR GENERAL CHECKING IF DATA IMPORTED CORRECTLY.

my_data %>%
  datatable(
    caption="THE GENERAL OVERVIEW OF IMPORTED DATA",
    options = list(
      pageLength = 4,
      autoWitdh = TRUE
    ))

DATA CLEANING

Before starting the data manipulation, it is crucial to ensure no duplicated rows or missing values exist in the dataset.

# Clean and standardize column names to lowercase and remove special characters for consistency
my_data <- janitor::clean_names(my_data)

# Identify and display duplicated rows if any exist

duplicated_df <- my_data %>%
  group_by_all() %>%
  filter(n() > 1) %>%
  ungroup()


duplicated_df %>%
  head(4) %>%
  gt::gt()
age gender height weight calc favc fcvc ncp scc smoke ch2o family_history_with_overweight faf tue caec mtrans n_obeyesdad
21 Female 1.52 42 Sometimes no 3 1 no no 1 no 0 0 Frequently Public_Transportation Insufficient_Weight
21 Female 1.52 42 Sometimes no 3 1 no no 1 no 0 0 Frequently Public_Transportation Insufficient_Weight
25 Female 1.57 55 Sometimes yes 2 1 no no 2 no 2 0 Sometimes Public_Transportation Normal_Weight
25 Female 1.57 55 Sometimes yes 2 1 no no 2 no 2 0 Sometimes Public_Transportation Normal_Weight
# Remove duplicated rows using the distinct function
prev_num_rows <- nrow(my_data)
my_data <- distinct(my_data)
new_num_rows <- nrow(my_data)

message <- paste("The number of rows has been updated to ", new_num_rows, ". Previously, it was ", prev_num_rows, ".", sep="")
cat(message, "\n")
The number of rows has been updated to 2087. Previously, it was 2111. 
# Check for missing values in the dataset
if (any(is.na(my_data))) {
  cat("Warning: The dataset contains missing values. Please handle them with caution.\n")
} else {
  cat("There are no missing values in the dataset.\n")
}
There are no missing values in the dataset.

GENERAL OVERVIEW

#| label: classnames


my_data <- clean_names(my_data)

# Retrieve and print the column names and their data types.
column_information <- my_data %>%
                      sapply(class) %>%
                      as.data.frame() %>%
                      rownames_to_column(var = "Column Name") %>%
                      rename(Class = 2)

# Display the columns and their classes in a table.
column_information %>%
  datatable(caption = "Column Names and Their Classes",options= list(  pageLength=8,
autoWidth =TRUE
  ))
NOTE!!: TO SEE A DETAILED INFORMATION ABOUT THE COLUMN , CLICK HERE

Detailed Column Information
cat(paste("The number of columns is ", ncol(my_data)))
The number of columns is  17
cat(paste("The number of rows is ", nrow(my_data)))
The number of rows is  2087
# Display the first 10 rows of our data for a preview.
my_data %>%
  head(10) %>%
  kable(format = "html", caption = "The Obesity DataSet Raw and Synthetic Data")
The Obesity DataSet Raw and Synthetic Data
age gender height weight calc favc fcvc ncp scc smoke ch2o family_history_with_overweight faf tue caec mtrans n_obeyesdad
21 Female 1.62 64.0 no no 2 3 no no 2 yes 0 1 Sometimes Public_Transportation Normal_Weight
21 Female 1.52 56.0 Sometimes no 3 3 yes yes 3 yes 3 0 Sometimes Public_Transportation Normal_Weight
23 Male 1.80 77.0 Frequently no 2 3 no no 2 yes 2 1 Sometimes Public_Transportation Normal_Weight
27 Male 1.80 87.0 Frequently no 3 3 no no 2 no 2 0 Sometimes Walking Overweight_Level_I
22 Male 1.78 89.8 Sometimes no 2 1 no no 2 no 0 0 Sometimes Public_Transportation Overweight_Level_II
29 Male 1.62 53.0 Sometimes yes 2 3 no no 2 no 0 0 Sometimes Automobile Normal_Weight
23 Female 1.50 55.0 Sometimes yes 3 3 no no 2 yes 1 0 Sometimes Motorbike Normal_Weight
22 Male 1.64 53.0 Sometimes no 2 3 no no 2 no 3 0 Sometimes Public_Transportation Normal_Weight
24 Male 1.78 64.0 Frequently yes 3 3 no no 2 yes 1 1 Sometimes Public_Transportation Normal_Weight
22 Male 1.72 68.0 no yes 2 3 no no 2 yes 1 1 Sometimes Public_Transportation Normal_Weight

DEEPING INTO DATASET

# Calculate descriptive statistics only for numeric columns in the dataframe my_data
additional_manipulate <- my_data %>%
  # Apply summary functions across all numeric columns
  summarise(across(
    where(is.numeric),  # Apply the functions only to numeric columns
    list(
      count = ~length(.),  # Count total values
      mean = ~mean(.),  # Calculate mean
      std = ~sd(.),  # Calculate standard deviation
      min = ~min(.),  # Find minimum value
      `50%` = ~median(.),  # Median
      max = ~max(.)  # Find maximum value
    )
  )) %>%
  # Keep data in a long format for clarity
  pivot_longer(
    cols = everything(), 
    names_to = c("Column", ".value"), 
    names_pattern = "(.*)_(.*)"
  )

# Output the descriptive statistics table

additional_manipulate %>%
  gt::gt()
Column count mean std min 50% max
age 2087 24.3530898 6.36880144 14.00 22.847618 61.00
height 2087 1.7026741 0.09318594 1.45 1.701584 1.98
weight 2087 86.8587296 26.19084708 39.00 83.101100 173.00
fcvc 2087 2.4214662 0.53473659 1.00 2.396265 3.00
ncp 2087 2.7011791 0.76461449 1.00 3.000000 4.00
ch2o 2087 2.0047494 0.60828437 1.00 2.000000 3.00
faf 2087 1.0128119 0.85347467 0.00 1.000000 3.00
tue 2087 0.6630354 0.60815328 0.00 0.630866 2.00
  • Based on the dataset description, it’s known that 77% of the data was synthetically generated using the Weka tool and the SMOTE filter, while the remaining 23% was directly collected from users through a web platform.

  • Click to learn organic -synthetic data

# Group the organic from synthetic data based on the dataset description.
df_organic <- my_data[1:488, ]
df_synthetic <- my_data[489:nrow(my_data), ]

# Display samples from both subsets to understand their characteristics.
cat("Last 3 rows of Organic Data\n")
Last 3 rows of Organic Data
kable(tail(df_organic, 3), caption = "Organic Data - Last 3 Rows")
Organic Data - Last 3 Rows
age gender height weight calc favc fcvc ncp scc smoke ch2o family_history_with_overweight faf tue caec mtrans n_obeyesdad
486 18 Female 1.67 66 Sometimes yes 3 3 no no 2 no 0 0 Sometimes Public_Transportation Normal_Weight
487 19 Male 1.80 60 no yes 3 1 yes no 1 yes 0 0 Always Motorbike Normal_Weight
488 20 Male 1.56 45 Sometimes no 2 3 no no 2 no 1 1 Sometimes Public_Transportation Normal_Weight
cat("First 3 rows of Synthetic Data\n")
First 3 rows of Synthetic Data
kable(head(df_synthetic, 3), caption = "Synthetic Data - First 3 Rows")
Synthetic Data - First 3 Rows
age gender height weight calc favc fcvc ncp scc smoke ch2o family_history_with_overweight faf tue caec mtrans n_obeyesdad
489 25.19621 Female 1.686306 104.5727 Sometimes yes 3 3 no no 1.152736 yes 0.319156 1.00000 Sometimes Public_Transportation Obesity_Type_III
490 18.50334 Female 1.683124 126.6738 Sometimes yes 3 3 no no 1.115967 yes 1.541072 1.00000 Sometimes Public_Transportation Obesity_Type_III
491 26.00000 Female 1.622397 110.7926 Sometimes yes 3 3 no no 2.704507 yes 0.000000 0.29499 Sometimes Public_Transportation Obesity_Type_III

VISUALIZATION WITH DATA MANIPULATION

# Analyze and visualize the most common ages, weights, genders 
#and heights in the dataset.

total_of_group <- function(data, group_of, top_n) {
  
  result_table <- data %>%
    group_by({{ group_of }}) %>%
    summarize(Count = n()) %>%
    arrange(desc(Count)) %>%
    top_n(top_n, wt = Count)
  
    # Calculate peak values
  peak_values <- result_table %>%
    mutate(ranking = row_number())
   
  
  ggplot(result_table, aes(x = reorder({{ group_of }}, -Count), y = Count)) +
    geom_bar(stat = "identity", fill = "yellow", color = "black") +
    geom_text(aes(label = Count), vjust = -0.5, size = 3) +  
    geom_line(data = peak_values, aes(group = 1, x = reorder({{ group_of }}, -Count), y = Count), color = "red") +
    labs(title = paste("Most Common", deparse(substitute(group_of))), x = deparse(substitute(group_of)), y = "Count")+
     theme(axis.text.x =element_text(color = "red"),
           axis.text.y= element_text(color = "red"),
           plot.title = element_text(face = "bold.italic", size = 14,color = "blue",family ="Times New Roman"), 
           axis.title.x = element_text(face = "bold.italic"),           
           axis.title.y = element_text(face = "bold.italic")  )
    
}
# Let's visualize the ages and also weights.
total_of_group(my_data, age, 20)

total_of_group(my_data, weight, 20)

THIS TYPE OF GRAPHIC WILL BE IN USE A LOT IN THE REST OF THE PROJECT, IN ORDER TO HAVE A BETTER UNDERSTANDING OF THE RELATIONSHIPS.

NOTE ! : TO SEE THE COUNTS OF OTHER COLUMNS , JUST CLICK HERE.

total_of_group(my_data, height, 20)

total_of_group(my_data,mtrans,6)

total_of_group(my_data,gender,2)

# Group by obesity level smoking , gender and alcohol consumption.
#Because we need these information to understand the relationship between obesity and other factors.
   rs <- my_data %>%
     group_by(calc,n_obeyesdad,smoke,gender) %>%
     summarize(count = n())
#Then , let's visualize it.

 ggplot(rs, aes(x = smoke, y = count, fill = n_obeyesdad)) + 
  geom_bar(stat = "identity", position = "dodge") +
  facet_grid(gender~calc ) +
  labs(
    title = "Relationship Between Alcohol Consumption, Smoking,Gender and Obesity",
    subtitle = "              ALCOHOL CONSUMPTION         ",
    x = "SMOKING",
    y = "Total Number",
    fill = "Obesity Levels"
  ) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    plot.subtitle = element_text(face = "bold.italic",hjust = 0.5,color = "red"),
    axis.text.x = element_text( hjust = 1 , face ="bold.italic",colour ="black"),
    axis.text.y = element_text(face ="bold.italic",colour ="black"),
    axis.title.x = element_text(face = "bold.italic", color = "red" , family = "Times New Roman" ,size = 14),
    axis.title.y = element_text(face = "bold.italic", color = "black" , family = "Times New Roman",size =14),
    legend.position = "bottom"
  )

IN THIS DATASET 2043 PEOPLE DO NOT SMOKE, WHICH MEANS THE NUMBER OF NON-SMOKERS IS QUITE HIGH. IN CONTRAST, ONLY 44 PEOPLE SMOKE. THEREFORE, THE COMPARISON REGARDING SMOKING IS NOT BALANCED SO WELL

 total_of_group(my_data,smoke,2)

JUST AS ABOVE, THE PEOPLE WHO SAY THEY “ALWAYS” DRINK ARE ONLY “1”. THAT’S WHY THE GRAPH OF “ALWAYS” IS NOT SO BALANCED.

total_of_group(my_data,calc,4)

# Create a new feature for activity levels
my_data <- my_data %>%
  mutate(
    activity_level = case_when(
      faf == 0 ~ "No Activity",
      faf < 1 ~ "Low Activity",
      faf < 3 ~ "Decent Activity",
      .default = "High Activity"
    )
  )

#Grouping again to see the relationship between activity level and transportation preferences.
data <- my_data %>%
  group_by(mtrans, activity_level, family_history_with_overweight) %>%
  summarise(total = n(), .groups = 'drop')
ggplot(data, aes(x = total, y = mtrans, fill = mtrans)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Relation between activity level, preferences of transportation and family history of overweight",
    subtitle ="                 History of overwight from your family  ",
    x = "Total Count",
    y = "Transportation Type"
  ) +
  theme(axis.text.x = element_text(hjust =1,color = "red",face = "bold.italic"), 
        plot.title = element_text(face = "bold", hjust = 0.5 ,size =10,family = "Times New Roman",color = "blue"),
                plot.subtitle = element_text(face = "bold", hjust = 0.5 ,size =8,family = "Times New Roman",color = "red"),

        axis.text.y = element_text(angle = 45, hjust = 1, face = "bold.italic"),
        legend.position ="bottom"
        ) +
  facet_grid(activity_level ~ family_history_with_overweight)

  • # Density plot for distribution of weight by obesity levels
    
     ggplot(my_data, aes(x = weight , fill = n_obeyesdad))+
      geom_density(alpha = 0.8) +  # Density plot
      labs(title = "Distribution of Weight by Obesity Levels", x = "Weight", y = "Density") +
      theme(axis.text.x = element_text(angle = 45, hjust = 1 , face = "bold.italic"),
     axis.text.y = element_text( face ="bold.italic",colour ="black"))

To see the percentages of obesity levels in this dataset , let’s print another graphic.

# Calculate the percentage of each obesity level
data <- my_data %>%
  count(n_obeyesdad) %>%
  mutate(percentage = n / sum(n) * 100)
# Create the pie chart
ggplot(data, aes(x = "", y = percentage, fill = n_obeyesdad)) +
  geom_bar(width = 1, stat = "identity") +
  coord_polar(theta = "y", start = pi) +  # Start at 180 degrees
  labs(fill = "OBESITY LEVELS") +
  theme_void() +  # Remove background and axis
  geom_text(aes(label = sprintf("%.2f%%", percentage)), 
            position = position_stack(vjust = 0.5), size = 5) +
  theme(legend.title = element_text(size = 15), 
        legend.text = element_text(size = 12))

DATA MANIPULATION

To understand what is BMI , please click here

I think everyone knows the formula of BMI.Let me write it again :

\[ BMI = \frac{Weight}{Height^{2}} \]

#Now , it is time to add new features to our dataset.
#First , let's calculate the BMI of every person.
  
calc_bmi <- \(dataset) {
      dataset %>%
      mutate(BMI = weight  / (height * height) )
  
}

my_data <- calc_bmi(my_data) 

#Let's check if it was added as new column.
my_data %>% 
        head(3) %>%
        datatable()
#I want to show relation between family history with overweight relation and obesity level.

 relations<-my_data %>%
  group_by(n_obeyesdad, family_history_with_overweight,gender) %>%
  summarise(count = n())
  

#As we can see, people who are overweight or have more weight than normal often have a relationship with their families overweight.
#LET'S ALSO VISUALIZE THIS :


ggplot(relations, aes(x = gender, y = count, fill = n_obeyesdad)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = round(count, 2)), position = position_dodge(width = 0.9), vjust = -0.5, size = 3) +
  labs(
    title = "Obesity Levels by Family History of Overweight over Gender",
    subtitle = "        Do you have a history from your family of Overweight ? ",
    x = "Gender",
    y = "Total Number",
    fill = "Obesity Level"
  ) +
  facet_wrap(~family_history_with_overweight)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1,face="bold.italic"))

# Group by obesity level and summarize mean BMI , weight ,height and age
  obesity_summary <- my_data %>%
    group_by(n_obeyesdad) %>%
    summarise(
      mean_BMI = mean(BMI),
      mean_age = mean(age),
      mean_wgh= mean(weight),
      mean_hgh = mean(height),
      total_num = n(),
      
    )
  

  
obesity_summary %>%
  arrange(desc(mean_BMI)) %>%
  datatable(
    caption = "Summary Statistics by Obesity Level",
    options = list(
      pageLength = 5,
      autiWidth =TRUE
    )
  )
# Convert mean_age to a factor

obesity_summary <- obesity_summary %>%
  mutate(mean_age_factor = factor(round(mean_age)))
# Visualization
ggplot(obesity_summary, aes(
  x = mean_age_factor,
  y = mean_BMI,
  fill = n_obeyesdad)) +
  transition_reveal(along = mean_age) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.5) +
  ylab("Mean of Age") +
  ggtitle("Mean BMI by Obesity Level") +
  labs(title = "Mean of AGE over BMI with Animation",
       x = "Mean of Age",
       y = "Mean of BMI") +
  scale_x_discrete() +
  scale_y_continuous(breaks = seq(0, max(obesity_summary$mean_BMI), by = 5)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(family = "Times New Roman", colour = "blue", face = "bold.italic"))

# Rearrange columns to move BMI to the front
my_data <- my_data %>%
  relocate(BMI, .after = age)

# Display the rearranged data
my_data %>%
  head(2) %>%
  datatable(caption = "Data with BMI Column Relocated")
# Create a new feature: Age group

my_data <- my_data %>%
  mutate(
    age_group = case_when(
      age < 20 ~ "Teenager",
      age < 35 ~ "Young Adult",
      age < 50 ~ "Middle Age",
      .default = "Senior"
    )
  )


# Group by age group and summarize mean BMI
age_group_summary <- my_data %>%
  group_by(age_group) %>%
  summarise(
    mean_BMI = mean(BMI),
    mean_weight = mean(weight),
    count = n()
  )

# Display the age group summary
age_group_summary %>%
  datatable(caption = "Summary Statistics by Age Group")
# Plot mean BMI by age group

ggplot(my_data, aes(x = age_group, y = BMI, fill = age_group)) +
  geom_violin(alpha = 1) +
  labs(
    title = "Graph of BMI by Age Group",
    subtitle = "                  The history of obesity in the family",
    x = "Age Group",
    y = "BODY MASS INDEX"
  ) +
  facet_grid( gender ~ family_history_with_overweight) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, face = "bold.italic"),
    legend.position = "bottom",
    legend.title = element_text(face = "bold.italic", size = 10, color = "purple" ),
    plot.title = element_text(face = "bold.italic", size = 14)
  )

Individuals with a family history of obesity tend to have higher BMI levels. This correlation suggests a significant genetic or environmental influence on obesity within families.

# Distribution of BMI by obesity level
ggplot(my_data, aes(x = BMI, fill = n_obeyesdad)) +
  geom_density(alpha = 0.8) +
  labs(
    title = "Distribution of BMI by Obesity Level",
    x = "BMI",
    y = "Density"
  ) +
  theme_minimal()

# Scatter plot of weight vs. height colored by obesity level
ggplot(my_data, aes(x = height * 100, y = weight, color = n_obeyesdad)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm")+
  labs(
    title = "Weight vs. Height by Obesity Level",
    x = "Height (cm)",
    y = "Weight (kg)"
  ) 

# Additional multi-line plot for summary statistics
obesity_summary_long <- obesity_summary %>%
  pivot_longer(cols = c(mean_BMI, mean_age,mean_wgh,total_num,mean_hgh), 
               names_to = "variable", 
               values_to = "value")
# Convert n_obeyesdad to a numeric type
#Because it will be used as transition_reveal , it must be numeric to add animation.

obesity_summary_long <- obesity_summary_long %>%
  mutate(n_obeyesdad_numeric = as.numeric(factor(n_obeyesdad, levels = unique(n_obeyesdad))))
ggplot(obesity_summary_long, aes(x = n_obeyesdad, y = value, color = variable, group = variable)) +
  geom_line(size = 1) +  # Draw lines
  geom_point(size = 3) +  # Add points+
  transition_reveal(along = n_obeyesdad_numeric) +  # Add transition along the x-axis (n_obeyesdad)
  labs(
    title = "Summary Statistics by Obesity Level",
    x = "Obesity Level",
    y = "Value",
    color = "Variable"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Mean of height is too close to each other between 169 to 171 , that’s why line looks like constant , but actually it is not.There is too small differences.

my_data <- my_data %>%
  mutate(fcvc = case_when(
    fcvc <1.6 ~ "Never",
    fcvc <2.3 ~ "Sometimes",
    .default = "Always"
  ))

  
st <-my_data %>%
   group_by(n_obeyesdad,fcvc,favc) %>%
   summarize(count = n())
ggplot(st, aes(x = count, y = n_obeyesdad, fill = n_obeyesdad)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_grid(fcvc ~ favc) + #rows for fcvc , columns with facv
  labs(
    title = "Relations With High Calorie Eater , Vegetable Frequency, and Obesity Levels",
    subtitle = "                                     DO YOU EAT HIGH CALORIE FOOD ?",
    y= "The frequency of eating vegetable",
    x = "Total Number",
    fill = "Obesity Level"
  ) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    axis.text.y= element_text(angle = 30, hjust = 1,size=8),
    legend.position = "bottom"
  )

The strange thing about this graph is that in the high calorie diet group, those who ate more vegetables appeared more obese than those who ate fewer vegetables… To be honest , I didn’t expect that result.It may be cause of the distribution of choices is not similar to each other.

total_of_group(my_data,fcvc,15)

# Visualize the distribution of activity levels by obesity level
activity_summary <- my_data %>%
  group_by(n_obeyesdad, activity_level) %>%
  summarise(count = n(), .groups = 'drop')
ggplot(activity_summary, aes(x = activity_level, y = count, fill = n_obeyesdad)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Activity Levels by Obesity")

#Before start the analyse , I should make the values of mutate integer , otherwise there are plenty of float numbers.

my_data <- my_data %>%
  mutate(tue = case_when(
    tue < 0.5 ~ 0,
    tue < 1.5 ~ 1,
    tue < 2.5 ~ 2,
   .default = 3
  ))
my_data <- my_data %>%
  mutate(Tech_Device_Hours = case_when(
    tue <1 ~ "0-2 Hours",
    tue <2 ~ "3-5 Hours",
    .default = "More than 5 hours" ))


statistic <- my_data %>%
  group_by(Tech_Device_Hours, activity_level) %>%
  summarize(count = n(), .groups = 'drop')
ggplot(statistic, aes(x = activity_level, y = count)) +
  geom_bar(stat = "identity", fill = "blue") +
  transition_states(states = Tech_Device_Hours, transition_length = 2, state_length = 1) +
  labs(
    title = "Activity Level Distribution by Tech Device Hours per day",
    subtitle = 'Tech Device Hours: {closest_state}',  # {closest_state} dynamically updates to reflect the current Tech_Device_Hours state in the animation
    x = "Activity Level",
    y = "Total numbers"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1 , face = "bold.italic" , color = "black"),
        plot.subtitle = element_text(face = "bold.italic" , color = "red",family = "Times New Roman")
  )

Hyperlink

You can find more information on obesity on the World Health Organization’s website.

REFERENCES FROM references.bib

REFERENCES FROM references.bib

References

Brown, L., and M. Green. 2020. “The Impact of Family History on Obesity: A Review.” Journal of Obesity Research 22 (3): 234–45.
Centers for Disease Control and Prevention. 2020. “Adult Obesity Facts.” https://www.cdc.gov/obesity/data/adult.html.
Johnson, Emma. 2018. Nutrition and Obesity. New York: Health Press.
Smith, John. 2019. “The Growing Concerns of Obesity in Modern Society.” Journal of Health 10 (2): 123–34.
World Health Organization. 2021. “Obesity and Overweight.” https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight.